Skip to content

test(amber): move state-mat e2e into amber-integration#5682

Merged
Yicong-Huang merged 9 commits into
apache:mainfrom
Yicong-Huang:test/state-mat-into-amber-integration
Jun 15, 2026
Merged

test(amber): move state-mat e2e into amber-integration#5682
Yicong-Huang merged 9 commits into
apache:mainfrom
Yicong-Huang:test/state-mat-into-amber-integration

Conversation

@Yicong-Huang

@Yicong-Huang Yicong-Huang commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Two things:

1. Move the state-materialization e2e tests into amber-integration. The two e2e specs that previously ran in the deleted pyamber-state-materialization-mac diagnostic job now run under the existing amber-integration job, which already provisions postgres + iceberg catalog DB + MinIO + Lakekeeper and runs pytest -m integration as its last step. The test's catalog backend switches from sqlite SqlCatalog to the real postgres-backed JdbcCatalog, matching test_iceberg_document.py:45, so we exercise the prod catalog code path instead of an sqlite shim.

2. Add macOS to the amber-integration matrix. We want CI coverage that the integration stack actually runs on macOS (where most dev machines live), not just Linux. GitHub-hosted macOS runners have no Docker, so each docker-dependent step now branches on $RUNNER_OS: macOS provisions postgres / minio / lakekeeper natively (brew install + the upstream lakekeeper aarch64-apple-darwin tarball — published from v0.11.0 onward, same version as the Linux docker tag) while Linux keeps the existing docker path. Protoc on macOS uses brew's arm64-native protobuf because protoc 3.19.4 has no arm64-mac build and running its x86_64 binary under Rosetta breaks the python_betterproto plugin (arch / site-packages split between Rosetta'd protoc and arm64 setup-python). For proto3 sources, the plugin output depends on betterproto, not protoc, so the protoc version drift on macOS is benign for codegen.

Before After
pyamber-state-materialization-mac macOS diagnostic job (build.yml:742) deleted
SqlCatalog (sqlite) injected in module fixture real postgres-backed JdbcCatalog, matching test_iceberg_document.py:45
Test discovered by an explicit pytest -sv <path> from that job @pytest.mark.integration + picked up by amber-integration's pytest -m integration
amber-integration runs only on ubuntu-22.04 amber-integration runs on [ubuntu-22.04, macos-latest] with $RUNNER_OS-branched service provisioning

StorageConfig.initialize is wrapped in a class-scoped autouse fixture (rather than called at module import time) so it co-exists with test_iceberg_document.py's import-time initialize regardless of pytest collection order. All catalog + S3 credentials read the same STORAGE_* env vars the production code consumes (via storage.conf), with defaults that match storage.conf — so the test stays aligned with whichever identity the surrounding CI infra uses.

The unit-style test_process_start_channel_persists_produce_state_on_start_output that the deleted mac job also ran is untouched: it monkeypatches the output manager and is already picked up by the regular pyamber job's pytest -m "not integration" step.

Any related issues, documentation, discussions?

Closes #5681.

How was this PR tested?

Locally against the existing texera-dev infra (postgres on 5432 with texera_iceberg_catalog schema initialized via sql/iceberg_postgres_catalog.sql):

cd amber && pytest -m integration --junit-xml=/tmp/junit-integration.xml -sv
# 3 passed, 502 deselected  -- test_state_written_by_output_manager_is_replayed_by_reader,
#                              test_state_table_persists_across_writer_close,
#                              test_rest_catalog_round_trip

And the regular pyamber suite still green:

cd amber && pytest -m "not integration" -q
# 502 passed, 3 deselected

In CI, the amber-integration job picks these tests up automatically because they're marked @pytest.mark.integration, on both ubuntu-22.04 and macos-latest.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

@github-actions github-actions Bot added pyamber ci changes related to CI labels Jun 13, 2026
@codecov-commenter

codecov-commenter commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.90%. Comparing base (99b9ca2) to head (59c5653).
⚠️ Report is 16 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5682      +/-   ##
============================================
- Coverage     53.09%   52.90%   -0.20%     
- Complexity     2546     2625      +79     
============================================
  Files          1076     1090      +14     
  Lines         41762    42208     +446     
  Branches       4513     4531      +18     
============================================
+ Hits          22174    22329     +155     
- Misses        18278    18569     +291     
  Partials       1310     1310              
Flag Coverage Δ *Carryforward flag
access-control-service 70.91% <ø> (-0.52%) ⬇️
agent-service 34.36% <ø> (ø)
amber 53.00% <ø> (-0.94%) ⬇️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.71% <ø> (ø)
file-service 57.06% <ø> (ø)
frontend 47.86% <ø> (+0.34%) ⬆️
pyamber 89.77% <ø> (-0.95%) ⬇️
python 90.74% <ø> (ø) Carriedforward from 9753074
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 2 better · 🔴 2 worse · ⚪ 11 noise (<±5%) · 0 without baseline

Compared against main 6becb85 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 434 0.265 22,101/34,168/34,168 us 🔴 +11.2% / 🟢 -7.1%
🟢 bs=100 sw=10 sl=64 938 0.573 105,323/126,264/126,264 us 🟢 -12.3% / 🟢 -9.7%
bs=1000 sw=10 sl=64 1,103 0.673 908,775/940,376/940,376 us ⚪ within ±5% / 🟢 -8.1%
Baseline details

Latest main 6becb85 from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 434 tuples/sec 439 tuples/sec 410.82 tuples/sec -1.1% +5.6%
bs=10 sw=10 sl=64 MB/s 0.265 MB/s 0.268 MB/s 0.251 MB/s -1.1% +5.7%
bs=10 sw=10 sl=64 p50 22,101 us 21,754 us 23,785 us +1.6% -7.1%
bs=10 sw=10 sl=64 p95 34,168 us 30,738 us 34,980 us +11.2% -2.3%
bs=10 sw=10 sl=64 p99 34,168 us 30,738 us 34,980 us +11.2% -2.3%
bs=100 sw=10 sl=64 throughput 938 tuples/sec 973 tuples/sec 891.94 tuples/sec -3.6% +5.2%
bs=100 sw=10 sl=64 MB/s 0.573 MB/s 0.594 MB/s 0.544 MB/s -3.5% +5.3%
bs=100 sw=10 sl=64 p50 105,323 us 100,392 us 112,277 us +4.9% -6.2%
bs=100 sw=10 sl=64 p95 126,264 us 144,023 us 139,802 us -12.3% -9.7%
bs=100 sw=10 sl=64 p99 126,264 us 144,023 us 139,802 us -12.3% -9.7%
bs=1000 sw=10 sl=64 throughput 1,103 tuples/sec 1,110 tuples/sec 1,041 tuples/sec -0.6% +6.0%
bs=1000 sw=10 sl=64 MB/s 0.673 MB/s 0.678 MB/s 0.635 MB/s -0.7% +5.9%
bs=1000 sw=10 sl=64 p50 908,775 us 899,717 us 972,714 us +1.0% -6.6%
bs=1000 sw=10 sl=64 p95 940,376 us 979,146 us 1,023,057 us -4.0% -8.1%
bs=1000 sw=10 sl=64 p99 940,376 us 979,146 us 1,023,057 us -4.0% -8.1%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,460.72,200,128000,434,0.265,22100.89,34168.24,34168.24
1,100,10,64,20,2132.07,2000,1280000,938,0.573,105322.65,126264.45,126264.45
2,1000,10,64,20,18126.89,20000,12800000,1103,0.673,908775.49,940376.32,940376.32

@Yicong-Huang Yicong-Huang force-pushed the test/state-mat-into-amber-integration branch 3 times, most recently from b9a13f2 to 9f13e26 Compare June 13, 2026 17:59
Drop the macOS-only `pyamber-state-materialization-mac` diagnostic job
and the sqlite-backed `SqlCatalog` override it relied on. Mark the
two e2e tests with @pytest.mark.integration so they run in the
`amber-integration` ubuntu job, which already provisions postgres +
iceberg catalog DB + MinIO + Lakekeeper and runs `pytest -m integration`
as its last step.

Catalog initialization now mirrors test_iceberg_document.py:45 —
postgres-backed JdbcCatalog with a tempdir warehouse — so we exercise
the real prod catalog code path instead of an sqlite shim. The
`StorageConfig.initialize` call is wrapped in a session-autouse
fixture (rather than at module import) so it co-exists with
test_iceberg_document.py's import-time initialize regardless of
pytest collection order.

The unit-style `test_process_start_channel_persists_produce_state_on_start_output`
that the mac job also ran is unaffected: it monkeypatches the output
manager and is already picked up by the regular `pyamber` job's
`pytest -m "not integration"` step.

Closes apache#5681
@Yicong-Huang Yicong-Huang force-pushed the test/state-mat-into-amber-integration branch from 9f13e26 to 009d9e4 Compare June 13, 2026 18:43
Yicong-Huang and others added 3 commits June 13, 2026 14:53
Signed-off-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
The macos-latest matrix entry was failing at "Set up job" / "docker:
command not found" because GitHub-hosted macOS runners have no
Docker and `services:` containers are Linux-only. Provision the same
stack natively on macOS:

  * postgres@16 via brew, with a `postgres` superuser created so the
    psql steps stay OS-agnostic;
  * minio via `brew install minio`, backgrounded with nohup;
  * lakekeeper from the upstream aarch64-apple-darwin tarball
    (published v0.11.0 onward — same version as the Linux docker tag);
  * minio-mc via brew for the warehouse bucket-init step;
  * protoc release asset switched on \$RUNNER_OS.

The job-level `services.postgres` block is also removed since it's a
Linux-only feature; both branches now start postgres via a step.

Each docker-dependent step branches on \$RUNNER_OS inside its `run:`
script so the YAML keeps one step per concept rather than
Linux/macOS pairs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
protoc 3.19.4 predates upstream arm64-mac builds — the previous
osx-aarch_64 URL 404'd. GitHub's Apple Silicon macOS runners ship
Rosetta 2 preinstalled, so the x86_64 binary runs transparently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@aglinxinyuan aglinxinyuan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is correct. I will review it after the CI passes.

Yicong-Huang and others added 3 commits June 13, 2026 16:32
protoc's import resolver walks each -I in order. With `-I=/usr/local/include`
listed first, ambermessage.proto's `import "org/apache/.../controlcommands.proto"`
caused protoc to stat `/usr/local/include/org/apache/...`. On Linux that
path returns ENOENT and protoc falls through to the next -I; on macOS
the unzip-created /usr/local/include lacks +x for the runner user, so
the stat returns EACCES and protoc aborts.

Widen the existing chmod from `/usr/local/include/google` to
`/usr/local/include` so the parent dir is traversable on both OSes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The macOS run dies with `protoc-gen-python_betterproto: Plugin failed
with status code 1` and no plugin stderr is surfaced. Wrap the script
so a failure dumps python/plugin paths, attempts to import
betterproto, and reports protoc's arch — enough to tell whether
the plugin is missing, mis-imported, or hitting an arch mismatch.
Remove this once the macOS path is stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rch mismatch

Diagnostics from the last run revealed:
  * /usr/local/bin/protoc was Mach-O x86_64 (running under Rosetta);
  * protoc-gen-python_betterproto lived in
    /Users/runner/hostedtoolcache/Python/3.11.9/arm64/bin/ (arm64-only
    setup-python);
  * the actual betterproto package was installed into the universal
    Framework Python at /Library/Frameworks/Python.framework/...

The x86_64-under-Rosetta protoc could exec the arm64 plugin shim, but
the arch/site-packages split meant the import inside the plugin landed
in a Python that didn't have betterproto wired up the same way, hence
"Plugin failed with status code 1" with no stderr surfaced.

Switching macOS to brew's arm64-native protobuf keeps the protoc /
plugin / python triad on a single arch (arm64), sidestepping all of
this. The version drifts from the pinned 3.19.4 used on Linux, but
python_betterproto's output for proto3 sources is determined by the
betterproto package version, not protoc — so the python codegen
stays bit-identical across the two paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yicong-Huang

Copy link
Copy Markdown
Contributor Author

The change is correct. I will review it after the CI passes.

@aglinxinyuan CI looks green now. Could you please check again?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the Python state-materialization end-to-end tests to run as part of the amber-integration CI job (instead of a separate macOS-only diagnostic job), and updates the test setup to exercise the production-like Iceberg catalog path.

Changes:

  • Updated test_state_materialization_e2e.py to use a postgres-backed Iceberg JdbcCatalog path, added @pytest.mark.integration, and moved initialization into an autouse fixture.
  • Refactored amber-integration in .github/workflows/build.yml to provision Postgres/MinIO/Lakekeeper on both Linux and macOS runners, and removed the dedicated pyamber-state-materialization-mac job.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
amber/src/test/python/core/architecture/packaging/test_state_materialization_e2e.py Switches e2e test from sqlite catalog override to postgres-backed catalog and marks it as integration.
.github/workflows/build.yml Removes the old macOS diagnostic job and updates amber-integration infra provisioning / OS matrix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/build.yml
Yicong-Huang and others added 2 commits June 14, 2026 20:01
…-gen diagnostics

Address Copilot review:

* Replace hardcoded S3 credentials (minioadmin/us-east-1) with
  os.environ.get(...) reads that match the same STORAGE_S3_* env vars
  the production code consumes via storage.conf, defaulting to the
  storage.conf defaults (texera_minio/password/us-west-2). The current
  test payload doesn't actually push large binaries, but the previous
  hardcoded creds would have diverged from the CI infra the moment
  any code-path under test reached for S3 — better to align now.

* Promote the _init_storage_config fixture from function-scope to
  class-scope. It force-resets the StorageConfig + IcebergCatalogInstance
  singletons and allocates a fresh tempdir warehouse — once per class
  is sufficient and avoids leaking N tempdirs / N reinitializations
  across the two tests in this class.

Also drop the temporary diagnostics block around `bash bin/python-proto-gen.sh`
in the amber-integration job. It was wired up to surface python /
plugin / arch info on macOS proto-gen failures; the brew protobuf
switch landed two commits ago and the macOS path is now green, so
the safety net is no longer pulling its weight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI ran `ruff format --check` and flagged the file from the previous
commit (long os.environ.get(...) lines were wrapped vs joined).
No semantic change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yicong-Huang Yicong-Huang enabled auto-merge June 15, 2026 03:17
@Yicong-Huang Yicong-Huang added this pull request to the merge queue Jun 15, 2026
Merged via the queue into apache:main with commit 3242ac9 Jun 15, 2026
26 checks passed
@Yicong-Huang Yicong-Huang deleted the test/state-mat-into-amber-integration branch June 15, 2026 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI pyamber

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move state-materialization e2e tests into amber-integration (drop sqlite SqlCatalog and the macOS-only diagnostic job)

4 participants